Adding Compute-Context-Length (CCL) #576

vjanfaza · 2025-09-26T00:19:47Z

Compute-Context-Length (CCL) technique optimizes the throughput of large language models (LLMs) on Qualcomm devices when handling very large context lengths. The current Ahead Of Time (AOT) compilation on Qualcomm devices doesn't predict the number of tokens needed, leading to significant throughput drops during the prefilling and the decoding phases. This happens because the system performs attention calculations based on large context length. To address this issue, we introduce Compute Context Length (CCL), an additional ONNX variable that allows for dynamic context-length specialization. By generating tokens using smaller, more manageable context lengths (CCL), we optimize memory reads and attention calculations, thereby improving throughput.

Signed-off-by: Vahid Janfaza <[email protected]>

quic#557) Updated the run_vlm_kv_model_on_pytorch and run_vlm_kv_model_on_ort methods to run for the latest dual QPC setup. Along with the required changes to be made in the Input Handler of VLMs. Also updated the way head_dim is calculated for past_key_value creation as certain models now provide specific head_dim. We fallback to previous method if the parameter isn't found in the config. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

Signed-off-by: Abukhoyer Shaik <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

vjanfaza requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners September 26, 2025 00:19

vjanfaza closed this Oct 7, 2025

vjanfaza force-pushed the CCL-main branch from c48d689 to 0182d95 Compare October 7, 2025 21:42

Adding Compute-Context-Length(CCL)

8c82207

Signed-off-by: Vahid Janfaza <[email protected]>

vjanfaza reopened this Oct 8, 2025

vjanfaza and others added 16 commits October 7, 2025 17:13

Adding Compute-Context-Length(CCL)

9b3a474

Signed-off-by: Vahid Janfaza <[email protected]>

Merge branch 'quic:main' into CCL-main

3409f01

Adding Compute-Context-Length(CCL)

16c8160

Signed-off-by: Vahid Janfaza <[email protected]>

Merge branch 'quic:main' into CCL-main

daaf3ed

Adding Compute-Context-Length(CCL)

d48978b

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

6305b99

Signed-off-by: Vahid Janfaza <[email protected]>

Merge branch 'quic:main' into CCL-main

bdb2dee

Adding Compute-Context-Length(CCL)

0407d34

Signed-off-by: Vahid Janfaza <[email protected]>

[CI]: making CI single thread (quic#584)

35e3d1b

Signed-off-by: Abukhoyer Shaik <[email protected]>

Adding Compute-Context-Length(CCL)

0b4ee3a

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

1142eba

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

ca0e7ef

Signed-off-by: Vahid Janfaza <[email protected]>

Merge remote-tracking branch 'origin/CCL-main' into CCL-main

a90a4ec

Adding Compute-Context-Length(CCL)

19061c6

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

916fe13

Signed-off-by: Vahid Janfaza <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Compute-Context-Length (CCL) #576

Adding Compute-Context-Length (CCL) #576

vjanfaza commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding Compute-Context-Length (CCL) #576

Are you sure you want to change the base?

Adding Compute-Context-Length (CCL) #576

Conversation

vjanfaza commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants